Importance Sampling for Fair Policy Selection
نویسندگان
چکیده
We consider the problem of off-policy policy selection in reinforcement learning: using historical data generated from running one policy to compare two or more policies. We show that approaches based on importance sampling can be unfair—they can select the worse of two policies more often than not. We give two examples where the unfairness of importance sampling could be practically concerning. We then present sufficient conditions to theoretically guarantee fairness and a related notion of safety. Finally, we provide a practical importance sampling-based estimator to help mitigate one of the systematic sources of unfairness resulting from using importance sampling for policy selection.
منابع مشابه
Tradeoff Negotiation: The Importance of Getting in the Game; Comment on “Swiss-CHAT: Citizens Discuss Priorities for Swiss Health Insurance Coverage”
Swiss-CHAT’s playful approach to public rationing can be considered in terms of deliberative process design as well as in terms of health policy. The process’ forced negotiation of trade-offs exposed unexamined driving questions, and challenged prevalent presumptions about health care demand and about conditions of public reasoning that enable transparent rationing. While the experiment provide...
متن کاملAdaptive Importance Sampling with Automatic Model Selection in Value Function Approximation
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are usually prohibitively expensive. A common approach is to use importance sampling techniques for compensating for the bias caused by the difference between data-sampling policies and the target policy. However, existing o...
متن کاملOptical simulation of a Popescu-Rohrlich Box
It is well known that the fair-sampling loophole in Bell test opened by the selection of the state to be measured can lead to post-quantum correlations. In this paper, we make the selection of the results after measurement, which opens the fair- sampling loophole too, and thus can lead to post-quantum correlations. This kind of result-selection loophole can be realized by pre- and post-selectio...
متن کاملIncorporating Cost-Effectiveness Data in a Fair Process for Priority Setting Efforts; Comment on “Use of Cost-Effectiveness Data in Priority Setting Decisions: Experiences from the National Guidelines for Heart Diseases in Sweden”
Cost-effectiveness data is useful for use in priority setting decisions in order to improve the efficiency of resources used. This paper thereby responds to Eckard et al. which addressed the use of cost-effectiveness data in the actual prioritization decisions in the Swedish national clinical guidelines for heart diseases. Based on a set of experiences on the use of economic evaluation in prior...
متن کاملResidents’ Satisfaction with Adequacy of Facilities in Metropolitan Ibadan, Nigeria
The study examined the quantity and quality of infrastructure in Ibadan, Nigeria with a view to using information to providing policy guidelines for sustainable infrastructural development. Using stratified sampling technique, a total of fifteen wards from the five local government areas in Ibadan metropolis were selected for study. The selection of all the local government areas is based on th...
متن کامل